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The present invention relates to antigens,, 
particularly in a purified form, of the virus of lympha- 
denopathies (denoted below by the abbreviation LAS ) and 
of the acquired immuno-depressive syndrome (denoted 
below by the abbreviation AIDS) r to a process for pro- 
ducing these antigens, particularly antigens of the 
envelopes of these viruses. The invention also relates 
to polypeptides, whether glycosylated or not, encoded by 
said DNA sequences . 

The causative agent if LAS or AIDS, a retro- 
virus, has been identified by F. BARRE-SINOUSSI et al, 
Science, 220, 868. (1983). It has the following charac- 
teristics. It is T-lymphotropic; its prefered target is 
constituted by Leu 3 cells (or T4 lymphocytes) ; it has 
reverse transcriptase activity necessitating the pre- 
sence of Mg + and exhibits strong affinity for poly(ade- 
nylate-oligodeoxy-thymidylate ) ( poly ( A ) -oligo ( dT ) 1 2- 1 8 ) . 
It has a density of 1.16-1.17 in a sucrose gradient, an 
average diameter of 139 nanometers; and a nucleus having 
an average diameter of 41 nanometers. Antigens of said 
virus r particularly a protein p25 are recognised immuno- 
logically by antibodies . contained in serums taken up 
from patients afflicted with LAS or AIDS. The p25 pro- 
tein, which is a core protein, is not recognised immuno- 
logically by the p24 protein of the HTLVI and II vi- 
ruses. The virus is also free of a p19 protein which is 
immunologically cross--reactive with the p19 proteins of 
HTLVI and HTLVI I. 
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Retroviruses of this type (sometimes denoted 
by the generic abbreviation LAV) have been filed in the 
National Collection of Micro-organism Cultures of the 
INSTITUT PASTEUR of Paris, under numbers 1-232, 1-240 

5 and 1-241. Virus strains similar to LAV in all respects 
from the morphological and immunological point of view 
have been isolated in other laboratories. Reference is 
made by way of examples to the retrovirus strains named 
HTLV-III isolated by R.C. GALLO et al . , Science, 224, 

10 500 (1984) and by M.G. S ARNG AD HARAN et al., Science 224, . 
506 (1984) respectively and to the retrovirus isolated 
by M. JAY LEVY et al ., Science , 225, 840-842 (1984), 
which virus was designated ARV. For the ease of language 
the last mentioned viruses, as well as others which have 

15 equivalent morphological and immunological properties, 
will be designated hereafter under the generic designa- 
tion "LAV" . Reference is also made to European patent 
application filed 14 September 1984, with the priority 
of British patent application number 83 2480Q filed 

20 15 September 1983 as regards a more detailed description 
of the LAV retroviruses or the like and of the uses to 
which extracts of these viruses give rise. 

Initially the core antigens were the main 
antigens of the virus lysates or extracts which were 

25 recognised by serums of patients infected with AIDS or 
LAS,^in the test systems which had then been used. A p42 
protein, presented as consisting of an envelope protein, 
had been detected too. In the same manner GALLO et al 
disclosed a p41 protein which was also deemed to be on a 

30 possible component of the viru envelope. 

Processes for obtaining a LAV virus have also 
been described. Reference may be made particularly to 
the article already mentioned of F. BARRE-SINOUSSI et 
al., as regards the preparation 6f the virus in T lym- 

35 phocyte cultures derived either from blood, or from the 
umbilical cord, or also from bone marrow cells of adult 
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donors in good health. This process comprises particu- 
larly the following essential steps : 

- producing a viral infection of these T lymphocytes, 
after activation by a .lectin mitogen, with a viral 

5 suspension derived from a crude supernatant liquor of 
lymphocytes producing the virus (initially obtained from 
a patient infected with AIDS or LAS), 

- culturing cells infected with TCGF, in the presence of 
anti-a- interferon sheep serum, 

-10 - effecting purification of the virus produced (produc- 
tion starts generally between the. 9th and the 15th day 
following infection and lasts from 10 to 15 days), which 
purification comprises precipitating the virus in poly- 
ethyl englycol in order to produce a first concentration 

15 of the virus, then centrif ugating tRe preparation 
obtained in a 20-60 % sucrose gradient or in an isotonic 
gradient of metrizanide (sold under the trade mark 
NYC0DENZ by NYEGAARD, Oslo) and recovering the virus 
with the band having a density of 1.16-1.17 in the 

20 sucrose, gradient or of 1.10-1.11 in the NYCODENZ gra- 
dient . 

The LAV virus may also be produced from perma- 
nent cell lines of type T, such as the CEM line, or from 
B lymphoblastpid cell lines, such as obtained by the 

25 transformation of the lymphocytes derived from a healthy 
donor with the Epstein-Barr virus, for instance as dis- 
closed in French patent application Nr. 84 07151 filed 
May 9, 1984. The permanent cell lines obtained produce 
continuously a virus (designated as LAV-B in the case of 

30 the B lymphoblastoid cell lines) which possesses the 
essential antigenic and morphological features of the 
LAV viruses (except that it is collected in a density 
band sometimes slightly higher than in the preceding 
case (particularly 1.18) in sucrose. The final purifi- 

35 cation of the virus can also be carried out in a 
NYCODENZ gradient. 
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A method for cloning DNA sequences hybridi- 
zable with the genomic RNA of LAS has already been 
disclosed in British Patent Application Nr. 84 23659 
filed on September 19, 1984. Reference is hereafter made 
5 to that application as concerns subject matter in common 
with the further improvements to the invention disclosed 
herein. 

The invention aims at providing purified un- 
altered virus forms (or viruses less altered by the 

10 purification procedures resorted to) and processes for 
obtaining said unaltered purified viruses. 

The present invention further aims at provi- 
ding additional new means which should not only also be 
useful for the detection of LAV or related viruses 

15 (hereafter more generally referred to as 11 LAV viruses"), 
but also have more versatility, particularly in detect- 
ing specific parts of the genomic DNA of said viruses 
whose expression products are not always directly de- 
tectable by immunological methods. The present invention 

20 further aims at providing polypeptides containing se- 
quences in common with polypeptides comprising antigenic 
determinants included in the proteins encoded and ex- 
pressed by the LAV genome occuring in nature. An addi- 
tional object of the invention is to further provide 

25 means for the detection of proteins related to LAV 
virus, particularly for the diagnosis of AIDS or 
pre- AIDS or, to the contrary, for the detection of 
antibodies against the" LAV virus or proteins related 
therewith, particularly in patients afflicted with AIDS 

30 or pre-AIDS or more generally in asymtomatic carriers 
and in blood-related "products. Finally the invention 
also aims at providing immunogenic polypeptides, and 
more particularly protective polypeptides for use in the 
preparation of vaccine compositions against AIDS or 

35 related syndroms. 
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The present invention relates to additional 
DNA. fragments, hybridizable with the genomic RNA of LAV 
as they will, be disclosed hereafter, as well as with 
additional cDNA variants corresponding to the whole 
5 genomes of LAV viruses. It further relates to DNA re- 
combinants containing said DNAs or cDNA fragments. 

An unaltered purified LAV retrovirus distin- 
guishes from those which have been defined above, in 
that it includes an amount .of one or several envelope 

10 antigens, sufficient to be visualized when the virus is 

35 

labelled with S-cystein, free of unlabelled cystein in 
a proportion of 200 microcuries per ml of medium, these 
antigens, among which particularly glycoproteins, are 
recognised selectively in vitro by serums of patients 

15 affected with SIDA or "SLAs or by the serums of 
asymptomatic carriers of the virus. 

A preferred antigen according to the preceding 
definition obtainable from a lysate of this virus (or by 
gentle scouring of the envelopes of the virus) is a gly- 
. 20 coprotein having a molecular weight of the order of 
110,000 daltons , as determined by its migration distance 
in comparison with the. distances of migrations, in a 
same migration system, of standard proteins having known 
molecular weights. Particularly comparative measurements 

25 were made on a 12.5 % polyacrylamid gel under a voltage 
of 18 V for 18 hours, upon using the following standard 
proteins (marketed by AMERSHAM) : 

- lysozyme-( 14 C) -methyl (MW: 14,300), 
carbon dioxide- ( 1 4 C) -methyl (MW: 30,000), 

30 - ovalbumin- ( 14 C) -methyl (MW: 46,000), 

- bovin albumin serum ( 14 C)-methyl (MW: 69,000), 

- phosphorylase b-( 14 C) -methyl (MW: 92,500), 

- myosine-( 14 C) -methyl (MW: 200,000). 

The invention relates " also to the antigens 
35 themselves, particularly that of molecular weight of 
about 110,000-120,000, which possess also the capability 
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of being recognised by serums of patients infected with 
AIDS or LAS or by serums of persons who have been expo- 
sed to LAV viruses or those analogous with, the latter. 
These antigens have also the characteristic of forming 
complexes with concana valine A, said complex being 
dissociatable in the presence of O-methyl-a-D-mannopy- 
ranoside. The antigens according to the invention can 
also bind to other lectins for example those known under 
the name "LENTYL-LECTIN* . The preferred antigen accord- 
ing to the invention, of molecular weight 110,000, is 
also sensitive to the action of endoglycosidases . This 
action is manifested by the production from the antigen 
of molecular weight 110,000 of a protein having a mole- 
cular weight of the order of 90,000, the latter being 
separable for example by immunoprecipitation or by se- 
paration employing the differences in molecular weights 
(migrations differentiated on gel). 

Preferred antigens of the invention are cons- 
tituted by glycoproteins . 

The invention relates also to the process for 
producing the viruses according to the invention. This 
process distinguishes, essentially from those recalled 
above at the level of the final purification operation. 
In particular, the purification step of the process 
according to the invention is no longer carried out in 
gradients, but involves the performance of differential 
centrifugations effected directly on the supernatants of 
the culture media of the producing cells. These centri- 
fugation operations comprise, particularly a first cen- 
trifugation at an angular centrif ugation velocity, par- 
ticularly of 10,000 rpm f enabling the removal of non- 
viral constituents, more particularly of cellular cons- 
tituents, then -a second centrif ugation at higher angular 
velocity, particularly at 45,000" rpm, to obtain the 
precipitation of the" virus itself- In preferred "embodi- 
ments, the first centrif ugation at 10,000 rpm, is 
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maintained for 10 minutes and the second at 45,000 rpm, 
for 20 minutes. These are, of course, only indicative 
values, it being understood that it remains within the 
ability of the specialist to modify the centrif ugation 
conditions, to provide for the separation of the cellu- 
lar constituents and of the viral constituents. 

This modification of the purification process 
results in the production of viral preparaations from 
which the antigen mentioned can then be isolated more 
easely, than from virus preparations purified by the 
previous methods. In any event, the viruses finally 
obtained by the process of the present invention are 
more easely recognised by serums of patients or of 
persons who have been exposed to the LAV virus or to 
morphologically and antigenically similar strains. 

The antigens according to the invention can 
themselve be obtained from the above disclosed viruses, 
by lysis (or other suitable processing) of the latter in 
the presence of any suitable detergent and by recovery 
and separation of the antigens released. Advantageously, 
the lysis of the virus is effected in the presence of 
aprotinin or of any other agent suitable for inhibiting 
the action of proteases. The separation of the antigens 
according to the invention can then be carried out by 
any method known in itself ; for example, it is possible 
to proceed with a separation of the proteins by employ- 
ing their respectively different migrations . in a pre- 
determined gel, the "protein sought being then isolated 
from the zone of the gel in which it would normally be 
found in an electrophoresis operation under well de- 
termined conditions," having- regard to its molecular 
weight. The antigens according to the invention can 
however be separated from the lysate of the a-bovesaid 
viruses, due tb their affinity for lectins, in par- 
ticular concanavaline A or lentyl-lectin. The" lectin 
used is preferably immobilised on a solid support, such 
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as the cross linked polymer derived from agarose and 
marketed under the trade mark SEPHAROSE . After washing 
of the fixed antigens with a suitable buffer, the 
antigens can be eluted in any suitable manner, 
5 particularly by resorting to a O-methyl-a-D- * 
mannppyranoside in solution. 

A more thorough purification of these antigens 
can. be performed by immurioprecipitation with the serums 
of patients known to possess antibodies effective 

10 against said protein, with concentrated antibody prepa- 
rations (polyclonal antibodies) or again with monoclonal 
antibodies, more particularly directed against the anti- 
gen according to the invention, in particular that 
having the molecular weight of 110,000, denoted below by 

15 the abbreviation gp110. 

Additional characteristics of the invention 
will appear also in the course of the description which 
follows of the isolation of a virus according to the 
invention and of antigens, particularly an envelope 

20 antigen of the virus, reference will be made to the 
drawings in which : 

Figure 1 is derived from a photographic repro- 
duction of gel strips which have been used to carry out 
electrophoreses of lysate extracts of T lymphocytes, 

25 respectively infected and uninfected (controls) by a LAV 
suspension. 

Figure 2 is the restriction map of a complete 
LAV genome (clone AJ19). 

Figures 3a to 3e are the complete sequence of * 
30 a LAV viral genome . 

Figures 4 and 5 show diagrammatically parts of 
the three possible reading phases of LAV genomic RNA, 
including the open reading frames (ORE) apparent in each 
of said reading phases. 
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Figure 6- is a schematic representation of the 
LAV long terminal repeat (LTR) . 

1 " PRODUCTION OF THP V TRPS AMD OP ANTTflras 

T lymphocytes derived from a healthy donor and 
infected with LAV1 , under the conditions described by 
F. BARRE-SINOUSSI et Coll., on CEM cells derived from a 
patient afflicted with leukemia- and also infected in 
vi££2 with LAV 1 , were kept under cultivation in a medium 
containing 200 microcuries of 35 S-cystein and devoid of 
unlabelled cystein.' The infected lymphocytes were cultu- 
red in a non denaturating medium to prevent the degrada- 
tion of the antigen sought. The supernatant liquor from 
the culture medium was then subjected to a first centri- 
fugation at 10,000 rpm for 10 minutes to "remove the non 
viral components, then to a second centrif ugation at 
45,000 rpm for 20 minutes for sedimenting the virus, the 
virus pellet was then lysed by detergent in the presence 
of aprotinin (5 %) particularly under the conditions 
described in the article of F. BARRE-SINOUSSI et Coll. 

The same operation was repeated on lymphocytes 
taken up from a healthy "donor as control. 

The various lysates were then immuno-precipi- 
tated by serums of patients infected with AIDS or with 
LAS. serums originating from healthy donors or of donors 
infected with other diseases were immunoprecipitated 
too. The media were then subjected to electrophoreses in 
a SDS-polyacrylamide gel. 

The results are indicated in figure 1 . The gel 
strips numbered from 1 to 6 were obtained from prepara- 
tions labelled by 35 S-cystein. The strips numbered 7 to 
10 show results observed on infected or uninfected lym- 
phocyte preparations labelled with 35 S-methionine. 
Finally the strip " M corresponds to the migration dis- 
tances of the standard proteins Identified above, whose 
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molecular weights are recalled in the right hand portion 

of the figure. 

The references to the labelled viral proteins 

appear on the left handside of the figure. 

5 It is noted that columns 7 to 10 show the 

35 

specific protein p25 of LAV, labelled. with S-methio- 
nin. The same protein is absent on strips 8 to 10 
corresponding to results obtained with a preparation 
originating from healthy lymphocytes . 
10 Columns 3 and 5 correspond to the results 

which have been observed on preparations obtained from 

- 35 

lymphocytes infected and labelled with S-cystein. The 
proteins p25 and p18 f the characteristic core proteins 
of LAV, and the glycoprotein gp110, also specific of 
15 LAV, were also present. Images corresponding to a pro- 
tein p41 (molecular weight of the order of 41,000) 
appeared in the * various preparations, although less 
distinctly . 

The virus according to the invention and the 
20 antigen according to the invention can be either pre- 
cipitated by lectins, particularly concanavaline A, or 
fixed to a SEPHAROSE-concanavaline A column. Particu- 
larly the purification of the envelope glycoproteins can 
be carried out as follows . This fixation can. particular- 
25 ly be carried out by contacting a lysate of the LAV 
virus " dissolved in a suitable buffer with concanava- 
line-A bound to SEPHAROSE. A suitable buffer has the 
following composition : 

Tris 10 mM 

30 NaCl 0.15 M 

CaCl 2 1 mM 

MgCl 2 . 1 mM 

Detergent marketed under the trade mark -TRITON 1 % 

" pH * 7.4 

- 

35 When the fixation has been achieved, the 

SEPHAROSE-concanavaline A is washed with a buffer of the 
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same composition, except that the TRITON concentration 
is lowered to 0.1 %. The elution is then effected with 
an 0.2 M O-methyl-a-D-mannopyranoside solution in the 
washing buffer. 

The protein may be further concentrated by 
immuno-precipitation with antibodies contained in the 
serums of patients infected with AIDS or with polyclonal 
antibodies obtained from a serum derived from an animal 
previously immunised against the "unaltered" virus^ 
according to the invention or the abovesaid glyco- 
protein. The protein can then be recovered by dissocia- 
tion of the complex by a solution having an adequate 
content of ionic salt. Preferably the antibody prepara- 
tion is itself immobilised in a manner known in itself 
on an insoluble support, for instance of the SEPHAROSE B 
type . 

It is also possible to resort to monoclonal ' 
antibodies secreted by hybridomas previously prepared 
against gp 110. These monoclonal antibodies, as well as 
the hybridomas which produce them, also form part of the 
invention . 

A technique for producing and selecting mono- 
clonal antibodies directed against the gp110 glyco- 
protein is described below. 
Immunisation of -the mice 

Groups of Balb/c mice from 6 to "8 weeks old 
mice were used. One group receives the virus carrying 
the abovesaid glycoprotein, another a purified glyco- 
protein gp110. The immunisation procedure, identical for 
all mice, comprises injec-ting 10 mg of the antigenic 
preparation in the prsence of Freund complete adjuvant 
at day 0, then again but in the presence of Freund 
incomplete adjuvant at day 14 and without adjuvant at 
days .28 and 42. The .three first injections are made 
intraperitoneal^, the fourth intravenously. 
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Fusion and culture of the hybrids 

The non secreting myeloma variant 5.53 P3 x 63 
Ag8, resistant to azaguanine, itself derived from the 
MOPC-21 cell-line, is used. Fusion with immunised mouse 
splenocytes is carried out in the presence of polyethy- 
lene-glycol 4000 by the technique of FAZEKAS de st-GROTH 
and SCHEIDEGGER on the 45th day. The selection of the 
hybrids in RPMI 16-40 "RAT" . medium is carried out in 
plates having 24 cups (known under the designation 
C0STAR) by resorting to the same culture techniques. 

The hybridomas producing antibodies of ade- 
quate specificity are then cloned in plates having 96 
cups, in the presence of a "feeder" layer of syngenic 
thymocytes. The producing clones thus selected 'are then 
expanded in 24 cup plates, still in the presence of 
thymocytes. When the confluence appears in one of the 
cups, the clone is injected intraperitoneally into a 
balb/c mouse which had received an injection of PRISTANE 
8 days previously and/or kept in liquid culture. 
Demonstration of the anti-LAV antibodies 

Five different techniques enable characterisa- 
tion of the clones producing antibodies of suitable spe- 
cificity. In a first stage, the hybrids producing anti- 
bodies are determined by an ELISA test revealing mouse 
immunoglobulins in the supernatant liquors. From this 
first selection, supernatants are sought which have 
antibodies directed against viral constituents by means 
of an ELISA test revealing anti-LAV antibodies, or by 
immunofluorescence on the virus producing human cells. 
Finally the supernatant liquours are analysed by radio- 
immunoprecipitation of virus labelled with cystein and 
by the Western-Blot technique on viral preparation which 
permit the determination of the specificities of these 
anti-LAV antibodies. 
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Cells obtained from the various fusions are 
placed under culture in 648 cups. Their microscopic 
examination shows that the majority of these cups con- 
tain, a single hybrid clone capable of growing in a " HAT" 
selective medium. More than 5Q % among them produce 
antibodies giving rise to a positive response under 
ELISA antivirus examination. The most representative 
fusions are tested by the Western-Blot technique and 
several of them are . subcloned , taking into account their 
respective specificities reactivities in antivirus ELISA 
and their behaviours under the culturing conditions . 
Those hybrids which are more particularly selected are 
those which produce antibodies which selectively 
recognise the viral glycoprotein gp110 having a 
molecular weight of about 110 KD . All the sub clonings 
give rise to clones' producing antibodies which, after 
expression, are injected into syngenic mice. Analysis of 
the specificities of the antibodies present in the 
different ascites liquids confirm the specificity of the 
antibodies of said ascites with respect to gp110. 

The monoclonal antibodies obtained can them- 
selves be employed to purify proteins containing an 
antigenic site also contained in gp110. The invention 
relates therefore also to these processes of purifica- 
tion as such. This process is advantageous ly applied to 
virus lysates or T lymphocyte lysates or other cells 
producing LAV or the like, when care has been taken to 
avoid the uncontrolled separation of gp110 during the 
purification procedure of the virus, prior to lysys 
thereof. Needless to say that the process can also be 
applied to any solution containing gp110 or a protein, 
polypeptide or glycoprotein comprising an antigenic site 
normally carried by the envelope proteii and recognised 
by the monoclonal antibody. For practising this process, 
the monoclonal antibodies are advantageously immobilised 
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on a solid support, preferably adapted to affinity chro- 
matography operations. For example, these monoclonal 
antibodies are fixed to an agarose lattice with three- 
dimensional cross-linking, marketed under the trade mark 
SEPH AROSE by the Swedish company PHARMACIA A.G., for 
example by the cyanogen bromide method. 

The invention therefore also relates to a pro- 
cess for separating the antigens concerned, which pro- 
cess comprises contacting a mixture of antigens, in- 
cluding those of interest (for instance a virus lysate 
or extract) , with an affinity column bearing the above- 
said monoclonal antibodies, to selectively fix polypep- 
tides , proteins or glycoproteins selectively recognized 
by said monoclonal anribodies, recovering the latter by 
dissociation of the antigen-antibody complex by means of 
a suitable buffer, particularly a solution of adequate 
ionic strength, for example of a salt, . preferably 
ammonium acetate (which leaves no residue upon freeze 
drying of the preparation or a solution acidified to a 
pH 2-4 or to a glycine buffer at the same pH and re- 
covering the eluted polypeptides, proteins or 
glycoproteins . 

It is self-evident that the invention relates 
also to polypeptide fragments having lower molecular 
weights and carrying antigenic sites recognisable by the 
same monoclonal antibodies. It is clear to the specia- 
list that the availabililty of monoclonal antibodies 
recognizing the gp110 glycoprotein gives also access to 
smaller peptide sequences .or fragments containing the 
common antigenic site or epitope. .Fragments of smaller 
sizes may be obtained by resorting to known techniques. 
For instance such a method comprises cleaving the ori- 
ginal larger polypeptide by enzymes capable of cleaving 
it at specific sites. By way of examples of such pro- 
teins , may be mentioned the enzyme of Staphylococcus 
aureus V8 , a-chymotrypsine , "mouse sub-maxillary gland 
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protease" marketed by the BOEHRINGER company, Vibrio 
aJ.qinol.yti ens chemovar ioohacms collagenase, which 
specifically recognises said peptides Gly-Pro and 
Gly-Ala, etc.. 

It is also possible to obtain polypeptides or 
fragments of envelope antigens of the virus, by cloning 
fragments excised from a cDNA constructed from genoimes 
of LAV variants. 

Figures 2 and 3 are restriction maps of such a 
cDNA comprising a total of 9.1 to 9.2 kb. The polypep- 
tides coded by cDNA fragments located in the region 
extending between site Kpnl (position 6100) and site 
Bglll (position 9150) of the restriction map of Figure 
2. The presence of a characteristic site of an envelope 
antigen - of the LAV virus or the like in any polypeptide 
expressed (in a suitable host cell transformed behore- 
hand by a corresponding fragment or by a vector contain- 
ing said fragment) can be detected by any suitable immu- 
nochemical means. 

Particularly the invention relates more par- 
ticularly to polypeptides encoded by cDNA fragments 
defined hereafter. It also relates to the nucleic acid 
fragments themselves, including a cDNA variant corres- 
ponding to a whole LAV retroviral genome, characterized 
by a series of restriction sites in the order hereafter 
(from the 5' end to-the 3' end). 

The coordinates of the successive sites of the 
whole LAV genome (see also restriction map of AJ19 in 
fig. 1) are indicated hereafter too, with respect to the 
Hind III site (selected as of coordinate 1) which is 
located in the R region. The coordinates' are estimated 
with an accuracy of ± 200 bp : " 

Hind III o 

Sac I 50 

Hind III 520 

Pst I soo 
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Another DNA variant according to this invention 
20 ' optionally contains an additional Hind III approximately 
at the 5 550 coordinate. 

Reference is further made to fig. 1 which shows a 
more detailed restriction map of said whole-DNA (XJ19) . 

An even more detailed nucleotidic sequence of a 
25 preferred DNA according to the invention is shown in figs. 
4a-4e hereafter. 

The invention further relates to other preferred 
DNA fragments and polypeptide sequences (glycosylated or 
not glycosylated) which will be referred to hereafter. 
30 SEQUENCING OF LAV 

The sequencing and determination of sites of par- 
ticular interest were carried out on a phage recombinant 
corresponding to AJ19 disclosed in the abovesaid British 
Patent application Nr. 84 23659. A method for preparing it - 
35 is disclosed in" that application. 
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The whole recombinant phage DNA of clone AJ19 
(disclosed in the earlier application) was sonicated 
according to the protocol of DEININGER (1983), Analytical 
Biochem. 129, 216. the DNA was repaired by a Klenow 
5 reaction for 12 hours at 16 # C. The DNA was electrophoresed 
through 0.8 % agarose gel and DNA in the size range of 
300-600 bp was cut out and electroeluted and precipitated. 
Resuspended DNA (in 10 mM Tris, pH 8 ; 0, 1 mM EDTA) was 
ligated into M13mp6 RF DNA (cut by the restriction enzyme 

10 Smal and subsequently alkaline phosphated )., using T4 DNA- 
and RNA-ligases (Maniatis T et " al ( 1982) - Molecular 
cloning - Cold Spring Harbor Laboratory). An E . coll 
strain designated as TG1 was used for further study. This 
strain has the following genotype : 

15 Alac pro, supE, thi . F ' traD36 , proAB, lacl q , ZAM15,r~ 

This coli TGI strain has the peculiarity of 

enabling recombinants to be recognized easily. The blue" 
colour of the cells transfected with plasmids which did 
not recombine with a fragment of LAV DNA is not modified. 

20 . To the contrary cells transfected by a recombinant plasmid 
containing a LAV DNA fragment yield white colonies. The 
* technique which was used is disclosed in Gene ( 1983), 26 . 
101. 

This strain was transformed with the ligation mix 
25 using the Hanahan method (Hanahan D (1983) J. Mol. Biol. 
166, 557). Cells were plated out on tryptone-agarose plate 
with IPTG and X-gal in soft agarose. White plagues were 
either picked and screened or screened directly in situ 
using nitrocellulose filters. Their DNAs were hybridized 
30" with nick-translated DNA inserts of pUC18 Hind III 
subclones^ of XJ19. This" permitted the isolation of the 
plasmids or subclones of A which are identified in the 
table hereafter. In relation to this table it should also 
be noted that the designation of each plasmid is followed 
35 by the deposition number of a cell culture of E. coli" TGI 
containing the corresponding plasmid at the "Collection 
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Nationale des Cultures de Micro-organismes" (C.N. CM. ) of 
the Pasteur Institute in Paris, France. A non-transformed 
TGI cell line was also deposited at the C.N. CM. under Nr. 
1-364. All these deposits took place on November 15, 1984. 
The sizes .of the corresponding inserts derived from the 
LAV genome have also been indicated. 



WO 86/02383 



PCT/EP85/00548 



19 

- ' TABLES 

Essential features of the recombinant plasmids 

- pJ19 - 1 plasmid (1-365) 0.5 kb 

Hind III - Sac I - Hind III 
-pJ19 - 17 plasmid .(1-367) 0.6 kb 

Hind III - Pst 1 - Hind III 

- pJ19 - 6 plasmid (1-366) 1 . 5 kb 

Hind III (5 ' ) 

Bam HI - 

Xho I 

Kpn I 

Bgl II 

Sac I (3* ) 

Hind III 

- pJ19-13 plasmid - (1-368) 6 . 7 kb 

Hind III (5 ' ) 
Bgl II 

Kpn I - 

Kpn I 

Eco RI 

Eco RI 

Sal I 

Kpn I 

Bgl II 

Bgl II 

Hind in (3" ) 
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Positively hybridizing M13 phage plates were grown 
up for 5 hours and the single- stranded DNAs were 
extracted. 

M13mp8 subclones of AJ19 DNAs were sequenced 
according to the dideoxy method and technology devised by 
Sanger et al (Sanger et al (1977), Proc. Natl. Acad. Sci. 
USA, 2A t 5463 and M13 cloning and sequencing handbook, 
AMERSHAM (1983). the 17-mer oligonucleotide primer 
a- 35 SdATP (400Ci/mmol, AMERSHAM), and 0.5X-5X buffer 
gradient gels (Biggen M..D. et al ( 1983, Proc. Natl. Acad, 
Sci. USA, SO, 3963) were used. Gels were read and put into 
the computer under the programs of Staden (Staden R. 
(1982), Nucl. Acids Res. JIL. 4731). All the appropriate 
references and methods can be found in the AMERSHAM Mt3 
cloning and sequencing handbook. 

The complete DNA sequence of XJ19 (also designated 
as LAV-Ia) is shown in figs. 4 to 4e. 

.The sequence was reconstructed from the sequence 
of phage \J19 insert. The numbering starts at the cap site 
which was located" experimentally (see hereafter). Impor- 
tant genetic elements, major open reading frames and their 
predicted products are indicated together with the Hindlll 
cloning sites. The potential glycosylation sites in the 
env gene are overlined. The NH 2 -terminal sequence of 
p25 gag determined by protein microsequencing is boxed. 

Each nucleotide was sequenced on average 5.3 times 
: 85 % of the sequence was determined on both strands and 
the remainder sequenced at least twice from independent 
clones. The base composition is T, 22 . 2 % ; C, 17.8 % ; A, 
35.8 % ; G , 244.2 °* ; G + C, 42 %. The dinucleotide GC is 
greatly" under represented (0,9 %) as common amongst euka- 
ryotic sequences (Bird 1980). 

Figs . 5 and 6 provide a diagrammatized represen- 
tation of the lengths of the successive open reading 
frames corresponding to' the successive reading phases 
(also referred to by numbers "1", "2 U and "3" appearing in 



WO 86/02383 



PCT/EP85/00548 



21 

the left handside part of fig, 5). The relative positions 
of these open reading frames (ORF) with respect to the 
nucleotidic structure of the LAV genome is referred to by 
the scale of numbers representative of the respective 
positions of the corresponding nucleotides in the DNA 
sequence. The vertical bars correspond to the positions of 
the corresponding stop codons . 

The following genes and DNA fragments can be dis- 
tinguished on the different reading frames shown. Refe- 
rence is then also made to the proteins or glycoproteins 
encoded by said genes and fragments . 
1) The "aaa gene" (or ORF-gac) 

The "gag gene" codes for core proteins. 

gag : near the 5' extremity of the gag orf is a 
"typical" initiation codon (Kozak 1984) (position 336) 
which is not only the first in the gag orf r but the first 
from the cap site. The precursor protein is 500-aminoacids 
long. Calculated MW = 55841 agrees with the 55 kd gag pre- 
cursor polypeptide. The N-terminal aminoacid sequence of 
the major core protein p25 is encoded by the nucleotide 
sequence starting from position 732 (fig. 5a). This for- 
mally makes the link between the cloned LAV genome and the 
immunologically characterized LAV p25 protein. The protein 
encoded 5 ' of the p25 coding sequence is rather hydrophi- 
lic. Its calculated MW of 14866 is consistent with that of 
the gag protein p18. The 3' "part of the gag region: codes 
probably for the retroviral nucleic acid binding protein 
(NBP). Indeed, like in HTLV-1 (Seiki et al . , 1983) and RSV 
(Schwartz et al . , 1983), the motif Cys-X 2 -Cys-X g _ 9 -Cys 
common to all NBP (Orozlan et al . , 1984) is found dupli- 
cated (nucleotides 1509 and 1572 in LAV sequence). Consis- 
tent with its function the putative NBP is extremely basic 
(17 % Arg + Lys). 

Particularly it appears that a genomic fragment 
(ORF-gag) thought to code for the core antigens including 
the p25, pl8 and p13 proteins is located between 
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nucleotidic position 312 (starting with 5' CTA GCG GAG 3 1 ) 
and nucleotidic position 1835 (ending by CTCG TCA CAA 3 1 ). 
The structure of. the peptides or proteins encoded by parts 
of said ORF is deemed to be that corresponding to phase 2. 
5 The methionine aminoacid "M M coded by the ATG at 

position 336-338 is the probable initiation methionine of 
the gag protein precursor. The end of ORF-gag and accor- 
dingly of gag protein appears to be located at position 
1835. 

10 The beginning of p25 protein , thought to start by 

a Pro-Ile-Val-Gln-Asn-Ile-Gln-Gly-Gln-Met«Val-His- 

aminoacid sequence is thought to be coded for by the 
nucleotidic sequence CCTATA. . . , starting at position 732. 

The invention is thus more particularly concerned 

15 with and relates to ~~ 
the DNA sequence, extending from nucleotide 336 up to 
about nucleotide -1650, deemed to encode a p55 protein 
which is considered a containing aminoacid sequences cor- 
responding to those of the core proteins p18 and p25 of 

20 the LAV virus ; 

the DNA sequence, extending from nucleotide 732 up to 
about nucleotide 1300, deemed to encode the p25 protein ; 
- the DNA sequence, extending from about nucleotide 1371to 
about nucleotide 1650, deemed to encode the p13 protein ; 

25 - the DNA sequence, extending from nucleotide 336 up to 
about nucleotide 611, deemed to encode the p18 protein; 

The invention also relates to the purified poly- 
peptides which have the aminoacid structures encoded by 
the abovesaid fragments, particularly the p13, pl8, p25, 

30 p55 proteins or polypeptides which have the structures 
corresponding to those resulting from the direct trans- 
lations of the DNA sequences or fragments which have been 
defined more specifically hereabove, which peptidic 
sequences flow directly from fig. 4a. More particularly 

35 the invention relates to purified polypeptides having 
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peptidic sequences identical or equivalent to those 
encoded by the DNA sequences extending from the following 
nucleotide positions : 

- 336 to 1650 (p55) 

- 336 to 611 (p18) 

- 1371 to 1650 (p13) 

- 732 to 1300 (p25) . 

It should be mentioned that the pl3, p18 and p25 
. all appear to derive from a same precursor, i.e. p55. 

The invention further concerns polypeptide frag- 
ments encoded by corresponding DNA fragments of the gag 
open reading frame. Particularly hydrophilic peptides in 
the gag open reading frame are identified hereafter. They 
are defined starting from aminocid 1 = Met coded by the 
ATG starting from 336-338 in the LAV DNA sequence (fig. 
3a) and then further numbered in accordance with their 
order in the gag sequence. The first and second numbers in 
relation to each peptide refer to the respective N-termi- 
nal and C- terminal -aminoacid respectively. 

Those hydrophilic peptides include : 
aminoacids 12-32 inclusive, i.e. Glu-Leu-Asp-Arg-Trp-Glu- 

Lys-Ile-Arg-Leu-Arg-Pro-Gly-Gly-Lys-Lys-Lys-Tyr- 

Lys-Leu-Lys 

aminoacids 37-46 inclusive, i.e. Ala-Ser-Arg-Glu-Leu-Glu- 

Arg-Phe-Ala-Val- 
amitioacids 49-79 inclusive, i.e. Gly-Leu-Leu-Glu-Thr-Ser- 

Glu-Gly-Cys-Arg-Gln-Ile-Leu-Gly-Gln-Leu-Gln-Pro- 

Ser-Leu-Gln-Thr-Gly-Ser-Glu-Glu--Leu-Arg-Ser-Leu- 

Tyr- 

aminoacids 8*8-153 inclusive, i.e. Val-His-Gln-Arg-Ile- 

Glu-Ile-Lys-Asp-Thr-Lys-Glu-Ala-Leu-Asp-Lys-Ile- 
Glu-Glu-Glu-Gln-Asn-Lys-Ser-Lys-Lys-Lys -Ala-Gin- 
Gln-Ala-Ala-Ala-Asp-Thr-Gly-His-Ser-Ser-Gln-Val- 
Ser-Gln-Asn-Tyr-Pro-Ile-Val-Gln-Asn-Ile-Gln-Gly- 
Gln-Met-Val-His-Gln-^la-Ile-Ser-Pro-Arg-Thr-Leu- ' 
Asn- 
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aminocacids 158-165 inclusive, i.e. Val-Val-Glu-Glu- 

Lys-Ala-Phe-Ser- 
aminoacids 178-188 inclusive, i.e. Gly-Ala-Thr-Pro-Gln- 

Asp-Leu-Asn-Thr-Met-Leu- 
aminoacids 200-220 inclusive, i.e. Met-Leu-Lys^Glu-Thr- 

Ile-Asn-Glu-Glu-Ala-Ala-Glu-Trp-Asp-Arg-Val-His- 

Pro-Val-His-Ala- 
aminoacids 226-234 inclusive, i.e. Gly-Gln-Met-Arg-Glu- 

Pro-Arg-Gly-Ser- 
aminoacids 239-264 inclusive, i.e. Thr-Thr-Ser-Thr-Leu- 

Gln-Glu-Gln-Ile-Gly-Trp-Met-Thr^Asn-Asn-Pro- 

Pro-Ile-Pro-Val-Gly-Glu-Ile-Tyr-Lys-Arg- 
aminocids 288-331 "inclusive, i.e. Gly-Pro-Lys-Glu-Pro- 

Phe-Arg-Asp-Tyr-Val-Asp-Arg-Phe-Tyr-Lys-Thr-Leu- 

Arg-Ala-Glu-Gln-Ala-Ser-Gln-Glu-Val-Lys-Asn-Trp- 

Met-Thr-GluThr-Leu-Leu-Val-Gln-Asn-Ala-Asn-Pro- 

Asp-Cys-Lys- 

aminoacids 352-361 inclusive, i.e. Gly-Val-Gly-Gly-Pro- 
Gly-His-Lys-Ala-Arg- 

aminoacids 377-390 inclusive, i.e. Met-Met-Gln-Arg-Gly- 
Asn-Pfre-Arg-Asn-Gln-Arg-Lys-Ile-Val- 

aminoacids 399-432 inclusive, i.e. Gly-His-Ile-Ala-Arg- 
Asn-Cys-Arg-Ala-Pro-Arg-Lys-Lys-Gly-Cys-Trp-Lys- 
Cys-Gly-Lys-Glu-Gly-His-Gln-Met-Lys-Asp-Cys-Thr- 
Glu-Arg-Gln-Ala-Asn- 

aminoacids 437-484 inclusive, i.e. Ile-Trp-Pro-Ser-Tyr- 
Lys-Gly-Arg-Pro-Gly-Asn-Phe-Leu-Gln-Ser-Arg- 
Pro-Glu-Pro-Thr-Ala-Pro-Pro-Glu-Glu-Ser-Phe- 
Arg-Ser-Gly-Val-Glu-Thr-Thr-Thr-Pro-Ser-Gln- 
Lys-Gln-Glu-Pro-Ile-Asp-Lys-Glu-Leu-Tyr- 

aminoacids 492-498 inclusive, i.e. Leu-Phe-Gly-Asn-Asp- 
Pro-Ser- 

The invention also relates to any combination of 
these peptides. • 
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2) The m do1 gene" fo r ORF-pol) 

Pol : The reverse transcriptase gene can encode a 
protein of up to 1,003 aminoacids (calculated MW = 
113629). Since the first methionine codon is 92 triplets 
from the origin of the open reading frame, it is possible 
that the protein is translated from a spliced messenger 
RNA, so giving a gag-pol polyprotein precursor. 

The pol coding region is the only one in which 
significant homology has been found with other retroviral 
protein sequences, three domains of homology being 
apparent. The first is a very short region of 17 amino- 
acids (starting at 1856). Homologous regions are located 

RSV 

within the p15 gag protease (Dittmar and Moelling 1978) 
and a polypeptid encoded by an open reading frame located 
between gag and pol of HTLV-1 (fig. 5) (Schwartz et al . , 
1983, Seiki et al. f 1983). This first domain could thus 
correspond to a conserved sequence in viral proteases. Its 
different location within the three genomes may not be 
significant since retroviruses, by splicing or other 
mechanisms, express a gag-pol polyprotein precursor 
(Schwartz et al . , 1983, Seiki et al . , 1983). The second 
and most extensive region of homology (starting at 2048) 
probably represents the core sequence of the reverse 
transcriptase. Over a region of 250 aminoacids, with only 
minimal insertions or deletions, LAV shows 38 % aminoacid 
identity with RSV, 25 '% with HTLV-I, 21 % with MoMuLV 
(Schinnick et al . , 1981) while HTLV-I and RSV show 38 % 
identity in the same region. A third homologous region is 
situated at the 3 ' end of the pol reading frame and 
corresponds to part of the pp32 peptide of RSV that. has 
exonuclease activity (Misra et al . f ~ 1982). Once again, 
there is greater homology with the corresponding RSV 
sequence than with HTLV-1. 

Figs. 4a-4c .also show that the DNA fragment ex- 
tending from nucleotidic position 1631 (starting with 
5'TTT TTT ....3* to nucleotidic position 5162 thought to 
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correspond to the pol gene. The polypeptidase structure of 
the corresponding polypeptides is deemed to be that 
corresponding to phase !, It stops at position 4639 (end 
by 5'G GAT GAG GAT 3 1 ) . 

These genes are thought to code for the virus 
polymerase or reverse transcriptase. 
3 ) The envelope gene (or ORF-env) 

env : The env open reading frame has a possible 
initiator methionine codon very near the beginning (8th 
triplet). If so the molecular weight of the presumed env 
precursor protein (861 aminoacids, MWcalc = 97376) is 
consistent with the size of the LAV glycoprotein (110 kd 
and 90 kd after glycosidase treatment) . There are 32 
potential N-glycosylation sites ( Asn-X-Ser/Thr ) which are 
overlined in Fig. 4d and 4e. An interesting feature of env 
is the very high number of Trp residues at both ends of 
the protein. 

The DNA sequence thought to code for envelope 
proteins is thought to extend from nucleotidic position 
5746 (starting with 5' AAA GAG GAG A.... 3') up to 
nucleotidic position 8908 (ending by .....A ACT AAA GAA 
3 ' ) . Polypeptidic structures of sequences of the envelope 
protein correspond to those read according to the "phase 
3" reading phase. 

The start of env transcription is thought to be at 
the level of the ATG codon at position 5767-5769. 

There are three hydrophobic regions, 

characteristic of the retroviral envelope proteins . (Seiki 
et al . , 1983) corresponding to a signal peptide (encoded 
by nucleotides 5815-5850 bp), a second region (7315-7350 
bp) and a transmembrane segment (7831-7890 bp). The second 
hydrophobic region (7315-7350 bp) is preceeded by a 
stretch rich in Arg + Lys . It is possible that this re- 
presents a site of proteolytic cleavage, which by analogy 
with over retroviral proteins, would give an external 
envelope polypeptide and a membrane associated protein 
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(Seiki et al., 1983, Kiyokawa et al . , 1984) . A striking 
feature of the LAV envelope protein sequence is that the 
segiaent encoding the transmembrane protein is -of unusual 
length (150 residues). The env protein shows no homology 
5 to any sequence in protein data banks. The small aminoacid 
motif common to the transmembrane proteins of all leukemo- 
genic retroviruses (Cianciolo et al., 1984) is not present 
in lav env. 

The invention concerns more particularly the DNA 

10 sequence extending from nucleotide 5767 up to nucleotide 
7314 deemed to encode the gp 110 (envelope glycoprotein of 
the LAV virus which has a molecular weight of about 
110,000 daltons) beginning at about nucleotide as well as 
the polypeptidic backbone of the glycoprotein sequence 

15 which corresponds to that having an approximate molecular 
weight which was initially believed to be 90,000 daltons, 
and which turned out to be 55,000. The polypeptide re- 
sulting from the complete removal of sugar residues of 
gp110 can be obtained by the treatment of said gpl 10 with 

20 the appropriate glycosidase. 

The invention further relates to the purified 
Polypeptides which have the aminoacid structure (or poly- 
peptidic backbone) of the g P 110 and gp90, which correspond 
to the direct translation of the DNA sequences and frag- 

25 ments which have been defined more specifically hereabove 
(figs 4d and 4e) . 

■ ■ ThS inve «*ion further relates to polvoeDtides 
containing neutralizing epitopes. Polypeptides 

30 apparent Th in l0 S« i0 ;2 °l J« u ***"?i»9 epitopes are further 

35 aminoacid. Sua the llll'i*, Sl'I*? X ° ther P°«ible 

wvh™ initial protein product or polypeotide 

Trills 'ilS- h J»T- « " S^SS*^ ssSS 
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located -at the periphery of and to be exposed outwardly 
with respect to the normal conformation of the proteins. 
Consequently they are considered as being epitopes which 
can efficiently be brought into play in vaccine composi- 
5 tions. 

The invention thus concerns with more particulari- 
ty peptide sequences included in the env-proteins and 
excisable therefrom (or having the same aminoacid struc- 
ture) , having sizes not exceeding 200 aminoacids. 

10 Preferred peptides of this invention (referred to 

hereafter asa r b, c f d # e f f) are deemed to correspond to 
those encoded by the nucleotide sequences which extend 
respectively between the following positions : 
a) from about 6171 to about 6276 

15 b) 6336 " M 638*6 

c) n " 6466 6516 

d) " 6561 ■ - 6696 

e) ". ■ 6936 " • 7006 

f) " " 7611 " M 7746 

20 Other hydrophilic peptides in the env open reading 

frame are identified hereafter, they are defined starting 
from aminoacid 1 » lysine coded by the AAA at position 
5746-5748 in the LAV DNA sequence (figs 4d and 4e) and 
then further numbered in accordance with their order with 

25 respect to the end seguence. The first and second numbers 
in relation to each peptide refer to their -respective N- 
terminal and C-terminai aminoacids. 

These hydrophilic peptides are : 
aminoacids 8-23 inclusive, i.e. Met-Arg-Val-Lys-Glu-Lys- 

30 . Tyr-Gln-His-Leu-Trp-Arg-Trp-Gly-Trp-Lys- 

aminoacids 63-78 inclusive, i.e. Ser-Asp-Ala-Lys-Aia-Tvr- 

Asp-Thr-Glu-Val-His-Asn-Val-Trp-Ala-Thr- 
aminoacids 82-90 inclusive, i.e. Val-Pro-Thr-Asp-Pro-Asn- 
Pro-Gln-Glu- 



35 
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aminoacids 97-123 inclusive, i.e. Thr-Glu-Asn-Phe-Asn- 

Met-Trp-Lys-Asn-Asp-Met-Val-Glu-Gln-Met-His-Glu- 
Asp-Ile-Ile-Ser-Leu-Trp-Asp-Gln-Ser-Leu- 

aminocids 127-183 inclusive, i.e. Val-Lys-Leu-Thx-Pro- 
5 Leu-Cys-Val-Sex-Leu-Lvs-Cys-Thx-Asp-Leu-Gly-Asn- 
Ala-Thr-Asn-Thr-Asn-Ser-Ser-Asn-Thr-Asn-Ser-Ser- 
Ser-Gly-Glu-Met-Met-Met-Glu-Lvs-Glv-Glu-Ile-Lvs- 
Asn-Cvs-Sex-Phe-Asn-Ile-Sex-Thx-Sex-Ile-ArQ-Glv- 
Lys-Val-Gln-Lvs- 
10 aminoacids 197-201 inclusive, i.e. Leu-Asp-Ile-Ile-Pxo- 
Ile-Asp-Asn-Asp-Thr-Thr- 

aminocids 239-294 inclusive, i.e. Lys-Cys-Asn-Asn-Lys- 

Thx-Phe-Asn-Gly-Thx-Gly-Pxo-Cvs-Thx-Asn-Val-Sex- 
Thr-Val-Gln-Cys-Thx-His-Gly-Ile-Axa-Pxo-Val-Val- 
1 5 Sex-Thx-Gln-Leu-Leu-Leu-Asn-Gly-Sex-Leu-Ala-Glu- 
Glu-Glu-Val-Val-Ile-Axq-Sex-Ala-Asn-Phe-Thx-Asp- 
Asn-Ala-Lys- 

aminocids 300-327 inclusive, i.e. Leu-Asn-Gln-Sex-Val-Glu- 
Ile-Asn-Cys-Thr-Arq-Pxo-Asn-Asn-Asn-Thx-Axa-Lvs- 
20 Ser-Ile-Axq-Ile-Gln-Axq-Gly-Pxo-Gly-Axq- 

aminoacids 334-381 inclusive, i.e. Lys-Ile-Gly-Asn-Met^ 
Arg-Gln-Ala-His-Cys-Asn-Ile-Ser-Ara-Ala-Lys-Trp- 
Asn-Ala-Thx-Leu-Lys-Gln-Ile-Ala-Sex-Lvs-Leu-Axq- 
Glu-Gln-Phe-Gly-Asn-Asn-Lys-Thx-Ile-Ile-Phe-Lys- 
25 Gln-Ser-Sex-Glv-Gly-Asp-Pxo- 

" aminoacids 397-424 inclusive, i.e. Cys-Asn-Sex-Thx-Gln- 
Leu-Phe-Asn-Sex-Thx-Txp-Phe-Asn-Sex-Thx-Txp-Sex- 
Thx-Glu-Gly-Sex-Asn-Asn-Thx-Glu-Gly-Sex-Asp- 
aminoacids 466-500 inclusive, i.e. Leu-Thx-Axg-Asp-Gly- 
30 Gly-Asn-Asn-Asn-Asn-Glv-Sex-Glu-Ile-Phe-Axq-Pxo- 
Glv-Glv-Gly-Asp-Met-Axq-Asp-Asn-Txp-Axq-Sex-Glu- 
Leu-Tvx-Lvs-Tvx-Lvs-Val- 
aminoacids 510-523 inclusive, i.e. Pro-Thr-Lvs-Ala-Lys- 
Axq-Axq-Val-Val-Gln-Axq-Glu-Lys-Axq- 

35 
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aminoacids 551-577 inclusive, i.e. Val-Gln-Ala-Arg-Gln- 
Leu-Leu-Ser-Gly-Ile-Val-Gln-Gln-Gln-Asn-Asn-Leu- 
Leu-Arg-Ala-Ile-Glu-Ala-Gln-Gln-His-Leu- 

aminoacids 594-603 inclusive, i.e. Ala-Val-Glu-Arg-Tyr- 
Leu-Lvs - Asp-Gln-Gln- 

aminoacids 621-630 inclusive, i.e. Pro-Trp-Asn-Ala-Ser- 
Trp-Ser-Asn-Lvs-Ser- 

aminoacids 657-679 inclusive, i.e. Leu-Ile-Glu-Glu-Ser- 
Gln-Asn-Gln-Gln-Glu-Lys-Asn-Glu-Gln-Glu-Leu-Leu- 
Glu-Leu-Asp-Lvs-Trp-Ala- 

aminoacids 719-758 inclusive, i.e. Arg-Val-Arg-Gln-Gly- 
Tvr-Ser-Pro-Leu-Ser-Phe-Gln-Thr-His-Leu-Pro-Thr- 
Pro-Arg-Glv-Pro-Asp-Arg-Pro-Glu-Gly-Ile-Glu-Glu- 
Glu-Gly-Glv-Glu-Arq-Asp-Ara-Asp-Ara-Ser-Ile- 

aminoacids 780-803 inclusive, i.e. Tvr-His-Arg-Leu-Arg- - 
Asp-Leu-Leu-Leu-Ile-Val-Thr-Ara-Ile-Val-Glu-Leu- 
Leu-Gly-Ara-Arq-Gly-Trp-Glu- 

The invention also relates to any combination of 
these peptides. 
4) The other ORFs 

The invention further concerns DNA sequences which 
provide open readinq frames defined as ORF-Q, ORF-R and as 
"1", "2", "3", "4", "5". the relative positions of which 
appear more particularly in f iqs . 2 and 3. 

These ORFs have the followinq locations : 



ORF-Q 
ORF-R 
ORF-1 
ORF-2 
ORF-3 
ORF-4 
ORF-5 
ORFs O and F 



phase 1 
2 
1 
2 
1 
2 
1 



start 4554 
8325 
5105 
5349 
5459 
5595 
8042 



stop 5162 
8972 
5392 
5591 
5692 
5849 
8355 



The viral (+) strand" of the LAV genome was 
found to contain the statutory retroviral qenes " encoding 
the core structural proteins (gag) , reverse transcriptase 
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(pol) and envelope protein (env), and two extra open rea- 
ding frames (orf) which we call Q and F (Table 1). The 
genetic organization of LAV, 5 1 LTR-qaq-pol-Q-env*-F-3 'LTR, 
is unique. Whereas in all replication competent retrovi- 
ruses pol and env qenes overlap, in LAV they are separated 
by orf 0 (192 amino acids) followed by four small (<100 
triplets) orf. The orf F (206 amino acids) slightly over- 
laps the 3' end of env and is remarkable in that it is 
half encoded by the U3 region of the LTR. 

Such a structure places LAV clearly apart 
from previously sequenced retroviruses (Fiq. 2). the (-) 
strand is apparently non codinq. The additional Hindlll 
site of the LAV clone AJ81 (with respect to AJ19) maps to 
the apparently non-codina reaion between 0 and env 
(positions 5166-5745), Starting at position 5501 is a 
sequence (AAGCCT) which differs by a sinqle base (under- 
lined) from the Hindi II recoqnition sequence. It is to be 
anticipated that many of the restriction site polymorphism 
between different isolates will map to this region. Clone 
XJ81 has also been referred to in British application Nr. 
84 23659 filed on September 15, 1984. 
Q and F : 

The nucleotide positions of their respective 
extremities are qiven in Table 1 hereafter. 

The location of orf 0 is without precedent in 
the structure of retroviruses. Orf F is unique in that it 
is half encoded by the U3 element of the LTR. Both orfs 
have "stronq" initiator codons (Kozak 1984) near their 5' 
ends and can encode proteins of 192 aminoacids (MW calc = 
22487) and 206 aminoacids (MWcalc = 23316) respectively. 
Both putative proteins are" hydrophilic (p0 49 % polar, 
15.1 \ Arq + Lvs : pF 46 % polar, 11 \ Arq + Lvs) and are 
therefore unlikely to be associated directly with mem- 
brane. The . function for the putative proteins pQ and pF 
cannot be predicted as no homoloqy was found by screening 
protein sequence data banks. Between orf F and the pX 
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protein of HTLV-1 there is no detectable homology. Fur- 
thermore their hydrophobicity/hydrophilicity profiles are 
completely different. It is known that retroviruses can 
transduce cellular genes notably proto-oncogenes (Weinbera 
5 1982)). We suggest that orfs Q and F represent exogenous 
genetic material and not some vestige of cellular DNA 
because (I) LAV DNA does not hybridize to the human genome 
under stringent conditions (Alizon et al« , 1984), (II) 
their codon usage is comparable to that of the gag, pol 

10 and env aenes (data not shown) . 

The organization of a reconstructed LTR and 
viral flanking elements are shown schematically in Fig. 6. 
The LTR is 638 bp long and displays usual features (Chen 
and Barker 1984) : (I) It is bounded by an inverted repeat 

15 (5'ACTG) including tfie conserved TG dinucleotide (Temin 

1981). (II) Adiacent to 5' LTR is the tRNA primer binding 

site (PBS), complementary to tRNA lys (Raba et al. f 1979), 

3 

(III) adjacent to 3 'LTR is a perfect 15 bp polvpurine 
tract. The other three polypurine tracts observed between 

20 nucleotides 8200-8800 are not followed bv a seauence which 
is complementary to that iust preceeding the PBS. The li- 
mits of U5 , R and U3 elements were determined as follows . 
05 is located between PBS and the polvadenvlation site 
established from the seauence of the 3' end of oligo(dT)-- 

25 primed LAVcDNA (Alizon et al . , 1984). Thus U5 is 84 bps 
long, the length of R+U5 was determined bv synthesizing 
tRNA-primed LAV cDNA. After alkaline hydrolysis of the 
primer, R+U5 was found to be 181 ± 1 bp. Thus R is 97 bps 
long and the capping site at its 5* end can be located. 

30 Finally U3 is 456 bp long. The LAV LTR also contains 
characteristic regulatory elements a polyadenylation 

signal seauence AATAAA 19 bp from the R-D5 iunction and 
the seauence ATATAAG which is very likely the TATA box, 22 
bps 5' of the cap site. There are no longer direct repeats 
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within the LTR. Interestinqly the LAV. LTR shows some 

similarities to that of the mouse mammary tumour virus 

« (MMTV) Donehower et al. f 1981). They both use 

lys 

tRNA as a primer for (-) strand synthesis whereas all 

other exogenous mammalian retroviruses known to date use 
tRNA pro (Chen and Barker 1984). They possess very similar 
polypurine tracts (that of LAV is AAAAGAAAAGGGGGG while 
that of MMTV is AAAAAAGAAAAAGGGGG ) . It is probable that 

10 the viral ( + ) strand synthesis is discontinuous since the 
polypurine tract flanking the U3 element of the 3* LTR is 
found exactly duplicated in the 3* end of orf pol, at 
4331-4336. In addition, MMTV and LAV are exceptionnal in 
that the U3element can encode an orf. In the case of MMTV, 

15 U3 contains the whole orf while in LAV, U3 contains 110 
codons of the 3' half of orf F. 

The LAV lonq terminal repeat (LTR) is dia- 
grammatically represented in Fiq. 6. As mentioned the LTR 
was* reconstructed from the sequence of AJ19 by -juxtaposing 

20 the sequences adiacent to the Hindlll cloninq sites. 

Sequencinq of oliqo(dT) primed LAV DNA clone 
PLAV75 (Alizon et al . , 1984) rules out the possibility of 
clustered Hindlll sites in the R reqion of LAV. LTR are 
limited by an invertd reoeat sequence (IR). Both of the 

25 viral elements flanking the LTR have been represented » 
tRNA primer bindinq site (PBS) for 5 1 LTR and polypurine 
track (PU) for 3' LTR. Also indicated are a putative TATA 
box, the cap site, polydenylation siqnal ( AATAAA) and* 
polyadenvlation site (CAA) . The location of the open 

30 reading frame F (648 nucleotides) is shown above the LTR 
schema. 

The LTR (long terminal repeats) can also be 
defined as lying between position 8560 and position 160 
(end extendinq over position 9097/1). As a matter of fact 
35 the end of the qenome is at 9097 and, because of the LTR 
structure of the retrovirus, links up with the beqinninq 
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of the sequence : 

Hind III 
CTCAATAAAGCTTGCCTTG 

n 

9097 1 

Table 1 sums up the locations and sizes of 
viral open readina frames. The nucleotide coordinates 
refer to the first base of the first triplet (1st 
triplet), of the first methionine initiation codon (Met) 
and of the stop codon (stop). The number of aminoacids 
and calculated molecular weiohts are those calculated for 
unmodified precursor products startina at the first 
methionine throuah to the end with the exception of poI 
where the size and MW refer to that of the whole orf . 
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The invention concerns more particularly all 
the DNA fragments which have been more specifically 
referred to hereabove and which correspond to open reading 
frames. It will be understood that the man skilled in the 
art will be able to obtain them all,. for instance by 
cleaving an entire DNA corresponding to the complete 
genome of a LAV species, such as by cleavage by a partial 
or complete digestion thereof with a suitable restriction 
enzyme and by the subseguent recovery of the relevant 
fragments. The different DNAs disclosed above can be 
resorted to also as a source of suitable fragments. The 
techniques disclosed hereafter for the isolation of the 
fragments which were then included in the plasmids 
referred to hereabove and which were then used for the DNA 
sequencing can be used. 

Of course other methods can be used. Some of 
them have been examplified in British Application Nr. 
8423659 filed on September 19, 1984. Reference is for 
instance made to the following methods. 

a) DNA can be trans feet ed into mammalian 
cells with appropriate selection markers by a variety of 
techniques, calcium phosphate precipitation, polyethylene 
glycol, protoplast-fusion, etc.. 

b) DNA fragments corresponding to genes can 
be cloned into expression vectors for E. coli , yeast- or 
mammalian cells and the resultant proteins purified. 

c) The provival DNA can be n shot-gunned" 
(fraomented) into procarvotic expression vectors - to 
crenerate fusion polypeptides. Recombinant producing 
antigenically competent fusion proteins can be identified 
by simply screening the recombinants with antibodies 
against LAV antigens . 

The invention further refers more 
specifically to DNA recombinants , _ particularly modified 
vectors, including any of the preceding DNA seguences and 
adapted to transform corresponding microorganisms or 
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cells, particularly eucaryotic- cells such as yeasts, for 
instance .s^ccharomyces cerevisiae, or higher eucaryotic 
cells, particularly cells of mammals, and to permit 
expression of said DNA sequences in the corresponding 
5 microorganisms or cells. General methods of that type have 
been recalled in the abovesaid British patent application 
Nr. 8429099 filed on November 16, 1984. 

More particularly the invention relates to 
such modified DNA recombinant vectors modified by the 

10 abovesaid DNA sequences and which are capable of 
transforming. hiqher eucaryotic cells particularly 
mammalian cells. Preferably any of the abovesaid sequences 
are placed under the direct control of a promoter 
contained in said vectors and which is recognized by the 

15 polymerases of said cells, such that the first nucleotide 
codons expressed correspond to the first triplets of the 
above-defined DNA-seauences . Accordingly this invention 
also relates to the corresponding DNA fragments which can 
be obtained from LAV genomas or corresponding cDNAs by any 

20 appropriate method. For instance such a method comprises 
cleaving said LAV genomas or cDNAs by restriction enzymes 
preferably at the level of restriction sites surrounding 
said fragments and close to the opposite extremities 
respectively thereof, recovering and identifying the 

25 fragments sought according to sizes, if need be checking 
thei,r restriction maps or nucleotide sequences (or by 
reaction with monoclonal antibodies specifically directed 
against epitopes carried by the polypeptides encoded by 
said DNA fragments), and further if need be, trimming the 

30 extremities of the fragment, for instance by an exonu- 
cleolytic enzyme such as Bal31 , for the purpose of 
controlling the desired nucleotide-sequences of the 
extremities- of said DNA fragments or, conversely, 
repairinq said extremities with Klenow enzyme and possibly 
"35 liqating the latter to synthetic polynucleotide fragments 
desiqned to permit the reconstitution of the nucleotide 
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extremities of said fragments. Those fragments may then be 
inserted in any of said vectors for causing the expression 
of the corresponding polypeptide by the cell transformed 
therewith. The corresponding polypeptide can then be re- 
covered from the transformed cells, if need be after lysis 
thereof, and purified, by methods such as electrophoresis. 
Needless to say that all conventionnal methods for per- 
forming these operations can be resorted to. 

The invention also relates more specifically 
to cloned probes which can be made starting from any DNA 
fragment according to this invention, thus to recombinant 
DNAs containing such fragments, particularly any plasmids 
amplifiable in procaryotic or eucaryotic cells and carry- 
ing said fragments . 

Using the cloneci DNA fragments as a molecular 
hybridization probe - either by labelling with radionu- 
cleotides or with fluorescent reagents - LAV virion RNA 
may be detected directly in the blood, body fluids and 
blood products (e.g. of the antihemophylic factors such as 
Factor VIII concentrates) and vaccines, i.e. hepatitis B 
vaccine. It has already been shown that whole virus can be 
detected in culture supernatants of LAV producing cells. A 
suitable method for achieving that detection comprises 
immobilizing virus onto a support, e.g. nitrocellulose 
filters, etc., disrupting the virion and hybridizing with 
labelled (radiolabelled or "cold" - fluorescent- or 
enzyme-labelled) probes. Such an approach has already been 
developed for Hepatitis B virus in peripheral blood 
(according to SCOTTO J. et al. Hepatology (1983) , 3, 
379-384). 

Probes according to the invention can also be 
used for rapid screening of genomic DNA derived from the 
tissue of patients with LAV related symptoms, to see if 
the pro viral DNA or RNA is present in host tissue and 
other tissues. 

A method which can be used for such screening 
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comprises the following steps : extraction of DNA from 
tissue, restriction enzyme cleavage of said DNA, electro- 
phoresis of the fragments and Southern blotting of qenomic 
DNA from tissues, subsequent hybridization with labelled 
cloned LAV provival DNA. Hybridization in .situ can also be 
used. 

Lymphatic fluids and tissues and other 
non-lymphatic tissues of humans, primates and other 
mammalian species can also be screened to see if other 
evolutionnary related retrovirus exist. The methods 
referred to hereabove can be used, although hybridization 
and washings would be done under non stringent conditions. 

The DNAs or DNA fragments according to the 
invention can be used also for achievina the expression of 
LAV viral antigens for diagnostic purposes . . 

The invention relates generally to the poly- 
peptides themselves, whether svnthetized chemically 
isolated from viral preparation or expressed by the 
different DNAs of the inventions, particularly by the ORFs 
or fragments thereof, in appropriate hosts, particularly 
procarvotic or eucaryotic hosts, after transformation 
thereof with a suitable vector previously modified by the 
corresonding DNAs. 

More generally, the invention also relates to 
any of the polypeptide fragments (or molecules, particu- 
larly glycoproteins having the same polypeptidic backbone 
as the polypeptides mentioned hereabove) bearing an 
epitope characteristic" of a LAV protein or glycoprotein,, 
which polypeptide or molecule then has N-terminal and 
C-terminal extremities respectively either free or, in- 
dependently from each other, covalently bond to aminoacids 
other than those which are normally associated with them 
in the larger polypeptides or glycoproteins of the LAV 
virus t which last mentioned aminoacids are then free or 
belonq to another polypeptidic sequence. Particularly the 
invention relates to hybrid polypeptides containing any of 



WO 86/02383 



PCT/EP85/00548 



40 

the epitope-bearing-polypeptides which have been defined 
more specifically hereabove, recombined with other poly- 
peptides fragments normally foreign to the LAV proteins, p 
having sizes sufficient to provide for an increased immu- 

5 noaenicity of the epitope-bearing-polypeptide yet, said 
foreian polypeptide fragments either being immunogenically 
inert or not interfering with the immunogenic properties 
of the epitope-bearing-polypeptide. 

Such hybrid polypeptides which may contain up 

10 to 150, even 250 aminoacids usually consist of the ex- 
pression products of a vector which contained ab initio a 
nucleic acid seguence expressible under the control of a 
suitable promoter or replicon in a suitable host, which 
nucleic acid seguence had however beforehand been modified 

15 by insertion therein of a DNA seguence encoding said 
epitope-bearing-polypeptide . 

Said epitope-bearing-polypeptides, particu- * 
larly those whose N-terminal and C-tenninal aminoacids are 
free, are also accessible by chemical synthesis, according 

20 to technics well known in the chemistry of proteins. 

The synthesis of peptides in homogeneous 
solution and in solid' phase is well known. 

In this respect, recourse may be had to the 
method of synthesis in homogeneous solution described by 

25 Houbenweyl in the work entitled "Methoden der Organischen 
Chemie" (Methods of Organic Chemistry) edited " by E. 
WDNSCH., vol. 15-1 and II. THIEME. Stuttaart 1974. 

This method of synthesis consists of 
successively condensing either the successive aminoacids 

30 in twos, in the appropriate order or successive peptide 
fragments previously available or formed and containing 
already several aminoacyl residues in the appropriate 
order respectively. Except for the carboxyl and 
aminogroups which will be engaged in the formation of the 

35 peptide bonds , care must be 'taken to protect beforehand 
all other reactive groups borne by these aminoacyl groups 
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or fragments. However , prior to the formation -of the 
peptide bonds, the carboxyl aroups are advantaaeouslv 
activated, according to methods well known in the 
synthesis of peptides. Alternatively, recourse may be had 
to couplinq reactions brinainq into play conventional 
couplina reaaents, for instance of the carbodiimide type, 
such as 1 -ethyl-3- ( 3 -dimethyl -aminopropvl ) -carbodiimide . 
When the aminoacid group used carries an additional amine 
group (e.g. lysine) or another acid function (e.g. 
glutamic acid), these groups may be protected by carbo- 
benzoxy or t-butyloxycarbonyl groups, as regards the amine 
groups, or by t-butylester groups, as regards the car- 
boxylic groups. Similar procedures are available for the 
protection of other reactive groups. Fox example, SH group 
(e.g. in cysteine) can be protected by an acetamidomethvl 
or paramethoxybenzyl group. 

In the case of progressive synthesis, amino- 
acid by aminoacid, the synthesis starts preferably by the 
condensation of the C- terminal aminoacid with the amino- 
acid which corresponds to the neighboring aminoacyl group 
in the desired seguence and so on, step by step, up to the 
N-terminal aminoacid. Another preferred technique can be 
relied upon is that described by R.D. Merrifield in "solid 
phase peptide synthesis" (J. Am. Chem. Soc. , 45, 2149- 
2154). 

In accordance with the Merrifield process, 
the first C-terminal aminoacid^ot the chain is fixed to a 
suitable porous polymeric resin, by meafi^ of its carboxy- 
lic group, the amino group of said aminoacid then being 
protected, for example by "a t-butyloxvcarbonvl group. 

When the first C-terminal aminoacid is thus 
fixed to the resin, the protective group of the amine 
group is removed by washing the resin with an acid,, i.e. 
trifluoroacetic acid, when the protective group- of the 
amine group is a t-butyloxycarbonyl group. 

Then the carboxylic group of the second 
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aminoacid which is to provide the second aminoacyl group 
of the desired peptidic sequence, is coupled to the depro- 
tected amine group of the C-terminal aminoacid fixed to 
the resin. Preferably, the carboxyl crroup of this second 
5 aminoacid has been activated, for example by dicyclohexyl- 
carbodiimide , while its amine group has been protected, 
for example by a t-butyloxvcarbonvl aroup. The first part 
of the desired peptide chain, which comprising the first 
two aminoacids, is thus obtained. As previously, the amine 

10 aroup is then deprotected, and one can further proceed 
with the fixing of the next aminoacyl group and so forth 
until the whole peptide sought is obtained. 

The protective groups of the different side 
groups, if any, of the peptide chain so formed can then be 

15 removed. The peptide sought can then be detached from the 
resin, for example, by means of hydrofluoric acid, arid 
finally - recovered in pure form from the acid solution 
according to conventional procedures . 

As regards the peptide sequences of smallest 

20 size and bearing an epitope or imunogenic determinant, and 
more particularly those which are readily accessible by 
chemical synthesis, it may be reguired, in order to in- 
crease their in vivo immunogenic character, to couple or 
"coniugate" them covalentlv to a physiologically accep- 

25 table and non toxic carrier molecule. 

By way of "examples of carrier molecules or 
macromolecular supports which can be used for making the 
coniugates according to the invention; will be mentioned 
natural proteins, such as tetanic toxoid, ovalbumin. 

30 serum-albumins, hemocyanins, etc.. Synthetic macromole- 
cular carriers, for example polysines or poly (D-L-ala- 
nine) -poly {L-lysine) s , can be used too. 

Other types of macromolecular carriers which 
can be used, which generally have 1 molecular weights higher 

35 than 20,000, are known from the literature. 

The coniugates can be synthesized by known 
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processes, such as described by Frantz and Robertson in 
•Infect, and Immunity", 33, 193-198 (1981), or by P.E. 
Kauffman in Applied and Environmental Microbiology, 
October 1981, Vol. 42, n* 4, 611-614. 
5 For instance the following coupling agents 

can be used : glutaric aldehyde, ethyl chlorof ormate, 
water-soluble carbodiimides (N-ethyl-N ' ( 3 -dimethyl amino- 
propyl) carbodiimide, HC1), diisocyanates , bis-diazoben- 
zidine, di- and trichloro-s-triazines , cyanogen bromides, 

10 benzaguinone , as well as coupling agents mentioned in 
"Scand. J. Immunol., 1978, vol. 8, p. 7-23 (Avrameas, 
Ternynck, Guesdon). 

Any coupling process can be used for bonding 
one or several reactive groups of the peptide, on the one 

15 hand, and one or several reactive groups of the carrier, 
on the other hand. Again coupling is advantageously 
achieved between carboxvl and amine groups carried by the 
peptide and the carrier or vice-versa in the presence of a 
coupling agent of the tvpe used in protein synthesis, i.e. 

20 1 -ethvl-3- ( 3-dimethvlaminopropvl ) -carbodiimide , N-hydro- 
xvbenzotriazole, etc.. Coupling between amine groups 
respectively borne by the peptide and the carrier can also 
be. made with glutaraldehyde, for instance, according to 
the method described by BOQUET , P. et al. (1982) Molec. 

25 Immunol., 15, 1441-1549, when the carrier is hemocvanin. 

The immunogenicity of" epitope-bearing- 
peptides can also be reinforced, by oligomerisation 
thereof, for example in the presence of glutaraldehyde or 
any other suitable coupling agent. In particular, the 

30 invention relates to the water soluble immunogenic 
oligomers thus obtained, comprising particularly from 2 to 
10 monomer units. 

The glycoproteins, proteins and polypeptides 
(generally designated hereafter as "antigens" of this 

35 invention, whethef obtained in a purified state from LAV 
virus preparations or - as concerns more particularly the 
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peptides - by chemical synthesis, axe useful in processes 
for the detection of the presence of anti-LAV antibodies 
in bioloqical media, particularly biological fluids such 
as sera from man or animal, particularly with a view of 
possibly diagnosing LAS or AIDS. 

Particularly the invention relates to an in 
vitro process of diagnosis making use of an envelope 
glycoprotein (or of a polypeptide bearing an epitope of 
this glycoprotein) for the detection of anti-LAV anti- 
bodies in the serums of persons who carry them. Other 
polypeptides - particular those carrying an : epitope of a 
core protein - can be used too. 

A preferred embodiment of the process of the 
invention comprises : 

- depositing a predetermined amount of one or several of 
said antigens in the cups of a titration miroplate ; 

- introducing of increasing dilutions of the biological 
fluid, i.e. serum to be diagnosed into' these cups : 

- incubating the microplate : 

- washing carefully the microplate with an appropriate 
buffer : 

adding into the cups specific labelled antibodies 
directed against blood immunoglobulins and 

- detecting the antigen-antibodv-complex formed, which is 
then indicative of the presence of LAV antibodies in the 
biological fluid. 

Advantageously the labelling of the anti- 
immunoglobulin antibodies is achieved by an enzyme 
selected from among those which are capable of hydrolysing 
a substrate, which substrate undergoes a modification of 
its radiation-absorption, at least within a predetermined 
band of wavelenqhts. The detection of the substrate, pre- 
ferably comparatively with respect to a control, then 
provides a measurement of the potential risks or of the 
effective presence .of the, disease. 

Thus preferred methods immuno- enzymatic or 
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also immunof luorescent detections, in particular according 
to the ELISA techniaue. Titrations may be determinations 
by immunofluorescence or direct or indirect immuno- 
enzymatic determinations . Quantitative titrations of 
5 antibodies on the serums studied can be made. 

The invention also relates to the diaanostic 
kits themselves for the in vitro detection of antibodies 
aaainst the LAV virus, which kits comprise any of the 
polypeptides identified herein, and all the bioloaical and 

10 chemical reagents, as well as eauipment. necessary for 
peformina diaanostic assays. Preferred kits comprise all 
reaaents reauired for carrying out ELISA assays. Thus 
preferred kits will include, in addition to any of said 
polypeptides, suitable buffers and anti-human immuno- 

15 globulins, which anti-human immunoglobulins are labelled 
"either by an immunof luorescent molecule or by an enzyme. 
In the last instance preferred kits then also comprise a 
substrate hydrolysable by the enzyme and providing a 
signal, particularly modified absorption of a radiation. 

20 at least in a determined wavelength, which signal is then 
indicative of the presence of antibody in the biological 
fluid to be assayed with said kit. 

The invention also relates to vaccine com- 
positions whose active principle is to be constituted by 

25 any of the antigens, i.e. the hereabove disclosed poly- 
peptides whole antigens, particularly the purified gp110 
or immunogenic fragments thereof, fusion polypeptides or 
oligopeptides in association with a suitable pharmaceu- 
tical or physiologically acceptable carrier. 

30 A first type of preferred active principle is 

the gp110 immunogeh. 

Other preferred active principles to be con- 
sidered in that fields consist of the peptides containing 

less than 250 aminoacid units, preferably less than 150, 

— 

35 as deducible for the complete genomas of LAV, and even 
more preferably those peptides which contain one or more 
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erroups selected from Asn-X-Ser and Asn-X-Ser as defined 
above. Preferred peptides for use in the production of 
vaccinating principles are peptides (a) to (f ) as defined 
above. Bv way of example havina no limitative character, 
there may be mentioned that suitable dosages of the 
vaccine compositions are those which are effective to 
elicit antibodies in vivo, in the host, particularly a 
human host. Suitable doses range from 10 to 500 micrograms 
of polypeptide, protein or glycoprotein per ka, for 
instance 50 to 100 micrograms per Icq. 

The different peptides according to this 
invention can also be used themselves for the production 
of antibodies, preferably monoclonal antibodies specific 
of the different peptides respectively. For the production 
of hybridomas secreting said monoclonal antibodies , con- 
ventional production and screening methods are used. These 
monoclonal antibodies, which themselves are part of the 
invention then provide very useful tools for the 
identification and even determination of relative 
proportions of the different polypeptides or proteins in 
biological samples, particularly human samples containing 
LAV or related viruses . 

The invention further relates to the ghosts 
(procarvotic or eucarvotic cells) which are transformed by 
the above mentioned recombinants and which are capable of 
expressing said DNA fragments . 

Finally the invention also concerns vectors 
for the transformation of eucaryotic cells of human 
origin, particularly lymphocytes, the polymerases of which 
are capable of recognizing the LTRs of LAV. Particularly 
said vectors are characterized by the presence of a LAV 
LTR therein, said LTR being then active as a promoter 
enabling the efficient transcription and translation in a 
suitable host of a DNA insert coding for a determined 
protein placed under its controls. 

Needless to say that the invention extends to 
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all variants of genomes and corresponding DNA fragments 
(ORFs) having substantially equivalent properties, all of 
said genomes belonging to retroviruses which can be con- 
sidered as equivalents of LAV. 

It must be understood that the claims which 
follow are also intended to cover all equivalents of the 
products ( glycoproteins , polypeptides , DNAs etc . . ) , 
whereby an eguivalent is a product, i.e. a polypeptide 
which may distinguish from a determined one defined in any 
of said claims, say through one or several aminoacids, 
while still having substantially the same immunological or 
immunogenic properties. A similar rule of egui valency 
shall apply to the DNAs , it being understood that the rule 
of eguivalency will then be tied to the rule of eguiva- 
lency pertaining to the polypeptides which" they encode. 

It will also be understood that all the 
litterature referred to hereinbefore or hereinafter, and 
all patent applications or patents not specifically 
identified herein but which form counterparts of those 
specifically designated herein must be considered as 
incorporated herein by reference. 
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CLAIMS 

1 . A purified product which contains the 
polypeptidic backbone of a glycoprotein having a molecular 
weight of about 110,000 or antigen of lower molecular 
5 weight derived from the preceding one, which purified 
product possesses the capacity of being recognised by 
serums of human origin and containing antibodies against 
the LAV virus. 

2. The purified product of claim 1 which is 
10 the purified glycoprotein. 

3. The glycoprotein of claim 2 which forms 
complexes with concanavaline A, said complex being disso- 
ciatable in the presence of O-methyl-a-D-mannopvranoside . 

4. The glycoprotein of claim 3 or claim 4, 
15 which binds lectins. 

5 . The glycoprotein of any one of claims 
which is. also sensitive to the action of endoglycosidases . " 

6. The purified product of any one of claims 
1 to 5 which has the polypeptidic backbone of the 

20 polypeptide encoded by the nucleic acid fragment extending 
between nucleotide numbered 6421 and nucleotide numbered 
<> of figure 1 . 

7. The purified product of claim 1 which is a 
polypeptide corresponding to any of those encoded by the 

25 nucleotide sequences which extend respectively between the 
following positions : 



a) 


from about 


6171 


to about 6276 


b) 


• • 


6336 


6386 


c) 


n u 


6466 


6516 


d) 


a * 


6561 


6696 


e) 


m n 


6936 


• " " 7006 


f ) 


m m 


7611 


7646 



which nucleotide seguences can be deduced from the LAV DNA 
shown in. figs. 4a-4e. : 

8. The purified product of claim 1 which is a 
peptide, containing a sequence of aminoacids deducible 
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from the proteins encoded by the LAV DNA, which peptide is 
selected from the group of polypeptides define^, hereafter, 
the numbers associated with each peptide corresponding to 
the . positions of its N-terminal and C-terminal aminoacids 
starting from lysine (amino 1) coded by the AAA at 
positions 5734-5748 of the LAV DNA shown in figs. 4a-4e : 
aminoacids 8-23 inclusive, i.e. Met-Arg-Val-Lys-Glu-Lys- 

Tyr-Gln-His-Leu-Trp-Arg-Trp-Gly-Trp-Lys- 
aminoacids 63-78 inclusive, i.e. Ser-Asp-Ala-Lys-Ala-Tyr- 

Asp-Thr-Glu-Val-His-Asn-Val-Trp-Ala-Thr- 
aminoacids 82-90 inclusive, i.e. Val-Pro-Thr-Asp-Pro-Asn- 

Pro-Gln-Glu- 

aminoacids 97-123 inclusive, i.e. Thr-Glu-Asn-Phe-Asn- 
Met-Trp-Lys-Asn-Asp-Met-Val-Glu-Gln- 
Met-His-Glu-Asp-Ile-Ile-Ser-Leu-Trp-Asp- 
Gln-Ser-Leu- 

aminocids 127-183 inclusive, i.e. Val-Lys-Leu-Thr-Pro- 
Leu-Cys-Val-Ser-Leu-Lys-Cvs-Thr-Asp- 
Leu-Gly-Asn-Ala-Thr-Asn-Thr-Asn-Ser-Ser-Asn- 
Thr-Asn-Ser-Ser-Ser-Gly-Glu-Met-Met-Met-Glu- 
Lvs-Glv-Glu-Ile-Lvs-Asn-Cvs-Ser-Phe-Asn-Ile- 
Ser-Thr-Ser-Ile-Ara-Glv-Lvs-Val-Gln-Lvs- 

aminoacids 197-201 inclusive, i.e. Leu-AsD-Ile-Ile-Pro- 
Ile-Asp-Asn-Asp-Thr-Thr- 

aminocids 239-294 inclusive, i.e. Lvs-Cvs-Asn-Asn-Lvs- 
Thr-Phe-Asn-Glv-Thr-Glv-Pro-Cvs-Thr- 
Asn-Val-Ser-Thr-Val-Gln-Cys-Thr-His-Gly- 
Ile-Arg-Pro-Val-Val-Ser-Thr-Gln-Leu-Leu- 
Leu-Asn-Gly-Ser-Leu-Ala-Glu-Glu-Glu-Val- 
Val-rie-Arg-Ser-Ala-Asn-Phe-Thr-Asp-Asn- 
Ala-Lys- 
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aminocids 300-327 inclusive, i . e . Leu-Asn-Gln-Ser-Val-Glu- 
Ile-Asn-Cys-Thz-Arq-Pro-Asn-Asn-Asn-Thr-Arg- 
Lys-Ser-Ile-Arg-Ile-Gln-Arg-Gly-Pro-Gly-Arq- 

aminoacids 334-381 inclusive, i.e. Lys-Ile-Gly-Asn-Met- 
5 Arq-Gln-Ala-His-Cys-Asn-Ile-Ser-Arq-Ala- 

Lys-Trp-Asn-Ala-Thr-Leu-Lys-Gln-Ile-Ala- 
Ser-Lys-Leu-Arq-Glu-Gln-Phe-Gly-Asn-Asn-Lys- 
Thr-Ile-Ile-Phe-Lys-Gln-Ser-Ser-Gly- 
Gly-Asp-Pro- 

10 aminoacids 397-424 inclusive, i.e. Cvs-Asn-Ser-Thr-Gln- 

Leu-Phe-Asn-Ser-Thr-Trp-Phe-Asn-Ser-Thr- 
Trp-Ser-Thr-Glu-Gly-Ser-Asn-Asn-Thr-Glu-Gly- 
Ser-Asp- 

aminoacids 466-500 inclusive, i.e. Leu-Thr-Arq-Asp-Gly- 
15 Gly-Asn-Asn-AsiT-Asn-Gly-Ser-Glu-Ile-Phe- 

Arq-Pro-Glv-Gly-Glv-Asp-Met-Arq-AsD-Asn-Trp- 
Ara-Ser-Glu-Leu-Tvr-Lys-Tyr-Lvs-Val- 
aminoacids 510-523 inclusive, i.e. Pro-Thr-Lvs-Ala-Lvs- 
Arq-Arq-Val-Val-Gln-Arq-Glu-Lys-Arq- 
20 aminoacids 551-577 inclusive, i.e. Val-Gln-Ala-Arq-Gln- 

Leu-Leu-Ser-Glv-Ile-Val-Gln-Gln-Gln-Asn- 
Asn-Leu-Leu-Ara-Ala-Ile-Glu-Ala-Gln- 
Gln-His-Leu- 

aminoacids 594-603 inclusive, i.e. Ala-Val-Glu-Arq-Tvr- 
25 Leu-Lys-Asp-Gln-Gln- 

aminoacids 621-630 inclusive, i.e. Pro-Trp-Asn-Ala-Ser- 

Trp-Ser-Asn-Lys-Ser- 
aminoacids 657-679 inclusive, i.e. Leu-Ile-Glu-Glu-Ser- 
Gln-Asn-Gln-Gln-Glu-Lvs-Asn-Glu-Gln-Glu- 
30 Leu-Leu-Glu-Leu-AsD-Lvs-Trp-Ala- 

aminoacids 719-758 inclusive, i.e. Axq-Val-Ara-Gln-Glv- 
Tvr-Ser-Pro-Leu-Ser-Phe-Gln-Thr-His-Leu- 
Pro-Thr-Pro-Ara-Glv-Pro-Aso-Ara-Pro-Glu- 
Glv-Ile-Glu-Glu-Glu-Glv-Glv-Glu-Ara-Asp-Axq- 
Asp-Arq-Ser-Ile- 
aminoacids 780-803 inclusive, i.e. Tyr-His-Arg-Leu-Axg- 
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Asp-Leu-Leu-Leu-Ile-Val-Thr-Arg-Ile-Val- 
Glu-Leu-Leu-Gly-Arg-Arq-Gly-Trp-Glu- 
or any combination of these peptides. 

9 . Peptide corresponding to the aminoacid 
5 sequences deducible from proteins encoded by LAV DNA, 
which peptide is selected fromthe group of polypeptides 
defined hereafter, the numbers associated with each 
peptide corresponding to the relative positions of its 
N-terminal and C-terminal aminoacids starting from 
10 methionine (aminoacid 1) coded by the ATG sequence at 
nucleotide positions 336-338 of the LAV DNA shown in figs. 
4a-4e : 

aminoacids 1 2-32 inclusive , i.e. Glu-Leu-Asp-Arq-Trp-Glu- 
Lys-Ile-Arg-Leu-Arg-Pro-Gly-Gly-Lys- 
15 Lys-Lys^Tyr-Lys-Leu-Lys 

aminoacids 37-46 inclusive, i.e. Ala-Ser-Arg-Glu-Leu-Glu- 
Arg-Phe-Ala-Val- ' 

aminoacids 49-79 inclusive, i.e. Gly-Leu-Leu-Glu-Thr-Ser- 
Glu-Gly-Cys-Arg-Gln-Ile-Leu-Gly-Gln- 
20 Leu-Gln-Pro-Ser-Leu-Gln-Thr-Gly-Ser-Glu- 

Glu-Leu-Arq-Ser-Leu-Tyr- 

aminoacids 88-153 inclusive, i.e. Val-His-Gln-Arq-Ile- 
Glu-Ile-Lvs-Asp-Thr-Lys-Glu-Ala-Leu- 
Asp-Lvs-Ile-Glu-Glu-Glu-Gln-Asn-Lys-Ser- 
25 Lvs-Lvs-Lvs-Ala-Gln-Gln-Ala-Ala-Ala-Asp- 

Thr-Gly-His-Ser-Ser-Gln-Val-Ser-Gln-Asn- 
Tyr-Pro-Ile-Val-Gln-Asn-Ile-Gln-Gly-Gln- 
Met-Val-His-Gln-Ala-Ile-Ser-Pro-Arq-Thr- 
Leu-Asnr 

30 aminocacids 158-165 inclusive, i.e. Val-Val-Glu-Glu- 

Lvs-Ala-Phe-Ser- 
aminoacids 178-188 inclusive, i.e. Gly-Ala-Thr-Pro-Gln- 

Asp-Leu-Asn-Thr-Met-Leu- 
aminoacids 200-220 inclusive, i.e. Met-Leu-Lys-Glu-Thr- 
35 Ile-Asn-Glu-Glu-Ala-Ala-Glu-Trp-Asp-Arg- 

Val-His-Pro-Val-His-Ala- 
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aminoacids 226-234 inclusive, i.e. Glv-Gln-Met-Arg-Glu- 
Pro-Arg-Glv-Ser- * , 

aminoacids 239-264 inclusive, i.e. Thr-Thr-Ser-Thr-Leu- 

Gln-Glu-Gln-Ile-Glv-Trp-Met-Thr-Asn-Asn-Pro- 
Pro-Ile-Pro-Val-Gly-Glu-Ile-Tyr-Lvs-Arg- 

aminocids 288-331 inclusive, i.e. Glv-Pro-Lvs-Glu-Pro- 
Phe-Arg-Asp-Tvr-Val-Asp-Arg-Phe-Tyr-Lys- 
Thr-Leu-Arg-Ala-Glu-Gln-Ala-Ser-Gln-Glu- 
Val-Lys-Asn-Trp-Met-Thr-GluThr-Leu- 
Leu-Val-Gln-Asn-Ala-Asn-Pro-Asp-Cys-Lvs- 

aminoacids 352-361 inclusive, i.e. Glv-Val-Glv-Glv-Pro- 
Gly-His-Lvs-Ala-Arg- 

aminoacids 377-390 inclusive, i.e. Met-Met-Gln-Arg-Glv- 
Asn-Phe-Ara-Asn-Gln-Arg-Lvs-Ile-Val- 

aminoacids 39 9-432 inclusive, i.e. Glv-His-Ile-Ala-Arg- 
Asn-Cys-Arg-Ala-Pro-Arg-Lvs-Lvs-Glv 
Cvs-Trp-Lvs-Cvs-Glv-Lvs-Glu-Glv-His-Gln-Met- 
Lvs-Asp-Cvs-Thr-Glu-Arg-Gln-Ala-Asn- 

aminoacids 437-484 inclusive, i.e. Ile-Trp-Pro-Ser-Tvr- 

Lvs-Glv-Ara-Pro-Glv-Asn-Phe-Leu-Gln-Ser-Arg- 
Pro-Glu-Pro-Thr-Ala-Pro-Pro-Glu-Glu-Ser-Phe- 
Arg-Ser-Glv-Val-Glu-Thr-Thr-Thr-Pro-Ser-Gln- 
Lvs-Gln-Glu-Pro-Ile-Asp-Lvs-Glu-Leu-Tvr- 

aminoacids 492-498 inclusive, i.e. Leu-Phe-Glv-Asn-Asp- 
Pro-Ser- 

and combination of said peptides. 

10. Process for obtaining the purified 
product of any bne of claims 1 to 6, which process com- 
prises starting from a LAV virus obtained from the super- 
natant of a culture of cells infected therewith and pu- 
rified under conditions such that it still carries subst- 
antial proportion of envelope antigens. Ivsing the virus, 
recovering the antigens released in the suvpernatant and 
.separating the purified product from other components of 
the lvsate. 

11. The process of claim 10 wherein the 
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starting virus has been purified by centrifuaation ope- 
rations , which centrifuaation operations comprise 
particularly a first centrifuaation at an anaular 
centrifuaation velocity, particularly of 10,000 rpm, 
enablina the removal of non-viral constituents, more 
particularly of cellular constituents, then a second 
centrifuaation at hiaher anaular velocity, particularly at 
45,000 rpm, to obtain the precipitation of the virus 
itself. 

12. A process for the production of the 
purified product of any of claims 1 to 9 which comprises 
transformina cell culture with a vector modified by a LAV 
DNA secruence encoding the corresponding polypeptide, which 
cell culture is capable of expressina said LAV DNA se- 
cruence, recovering the expression products containing the 
product of. claims 1 to 9 of said cell culture and sepa- 
rating the product of claims 1 to 9 from the other 
expression products . 

13. The process of any of claims 10, 11 or 
claim 12 in which the separation of the purified product 
is bv contacting with monoclonal antibodies specifically 
recognizing a protein, polypeptide or glycoprotein ac- 
cording to any one of claims 1 to 8. 

14. A method for the diagnostic of anti- 
bodies to the LAV virus in a biological fluid, particu- 
larly a human serum, which "comprises contacting said, 
biological fluid with the product of any of claims 1 to 9 
under conditions suitable for enabling a complex between 
said antibodies and said prdouct to be formed and detec- 
ting said complex as indicative the presence of said 
antibodies in said biological fluid. 

15. A immunogenic composition, containing 
the purified product of any of claims 1 to 9 in asso- 
ciation with a pharmaceutical vehicle suitable for the 
production "of vaccines, which purified product is in an 
amount effective to elicit the reduction of antibodies 
directed against said product. 
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